26 research outputs found
Network Model Selection for Task-Focused Attributed Network Inference
Networks are models representing relationships between entities. Often these
relationships are explicitly given, or we must learn a representation which
generalizes and predicts observed behavior in underlying individual data (e.g.
attributes or labels). Whether given or inferred, choosing the best
representation affects subsequent tasks and questions on the network. This work
focuses on model selection to evaluate network representations from data,
focusing on fundamental predictive tasks on networks. We present a modular
methodology using general, interpretable network models, task neighborhood
functions found across domains, and several criteria for robust model
selection. We demonstrate our methodology on three online user activity
datasets and show that network model selection for the appropriate network task
vs. an alternate task increases performance by an order of magnitude in our
experiments
Honesty is the Best Policy: On the Accuracy of Apple Privacy Labels Compared to Apps' Privacy Policies
Apple introduced \textit{privacy labels} in Dec. 2020 as a way for developers
to report the privacy behaviors of their apps. While Apple does not validate
labels, they do also require developers to provide a privacy policy, which
offers an important comparison point. In this paper, we applied the NLP
framework of Polisis to extract features of the privacy policy for 515,920 apps
on the iOS App Store comparing the output to the privacy labels. We identify
discrepancies between the policies and the labels, particularly as it relates
to data collected that is linked to users. We find that 287K apps'
privacy policies may indicate data collection that is linked to users than what
is reported in the privacy labels. More alarming, a large number of
(97\%) of the apps that have {\em Data Not Collected} privacy label have
a privacy policy that indicates otherwise. We provide insights into potential
sources for discrepancies, including the use of templates and confusion around
Apple's definitions and requirements. These results suggest that there is still
significant work to be done to help developers more accurately labeling their
apps. Incorporating a Polisis-like system as a first-order check can help
improve the current state and better inform developers when there are possible
misapplication of privacy labels
SoK: A Data-driven View on Methods to Detect Reflective Amplification DDoS Attacks Using Honeypots
In this paper, we revisit the use of honeypots for detecting reflective
amplification attacks. These measurement tools require careful design of both
data collection and data analysis including cautious threshold inference. We
survey common amplification honeypot platforms as well as the underlying
methods to infer attack detection thresholds and to extract knowledge from the
data. By systematically exploring the threshold space, we find most honeypot
platforms produce comparable results despite their different configurations.
Moreover, by applying data from a large-scale honeypot deployment, network
telescopes, and a real-world baseline obtained from a leading DDoS mitigation
provider, we question the fundamental assumption of honeypot research that
convergence of observations can imply their completeness. Conclusively we
derive guidance on precise, reproducible honeypot research, and present open
challenges.Comment: camera-read
Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse
Domain squatting is a common adversarial practice where attackers register
domain names that are purposefully similar to popular domains. In this work, we
study a specific type of domain squatting called "combosquatting," in which
attackers register domains that combine a popular trademark with one or more
phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first
large-scale, empirical study of combosquatting by analyzing more than 468
billion DNS records---collected from passive and active DNS data sources over
almost six years. We find that almost 60% of abusive combosquatting domains
live for more than 1,000 days, and even worse, we observe increased activity
associated with combosquatting year over year. Moreover, we show that
combosquatting is used to perform a spectrum of different types of abuse
including phishing, social engineering, affiliate abuse, trademark abuse, and
even advanced persistent threats. Our results suggest that combosquatting is a
real problem that requires increased scrutiny by the security community.Comment: ACM CCS 1
One Thing Leads to Another: Credential Based Privilege Escalation
Abstract A user's primary email account, in addition to being an easy point of contact in our online world, is increasingly being used as a single point of failure for all web security. Features like unlimited message storage, numerous weak password reset features and economically enticing spoils (in the form of financial accounts or personal photos) all add up to an environment where overthrowing someone's life via their primary email account is increasingly likely and damaging. We describe an attack we call credential based privilege escalation, and a methodology to evaluate this attack's potential for user harm at web scale. In a study of over 9,000 users we find that, unsurprisingly, access to a vast number of online accounts can be gained by breaking into a user's primary email account (even without knowing the email account's password), but even then the monetizable value in a typical account is relatively low. We also describe future directions in understanding both the technical and human aspects of credential based privilege escalation
Cloudsweeper: Enabling Data-Centric Document Management for Secure Cloud Archives
Cloud based storage accounts like web email are compromised on a daily basis. At the same time, billions of Internet users store private information in these accounts. As the Internet matures and these accounts accrue more information, these accounts become a single point of failure for both users ’ online identities and large amounts of their private information. This paper presents two contributions: the first, the heterogeneous documents abstraction, is a data-centric strategy for protecting high value information stored in globally accessible storage. Secondly, we present Cloudsweeper, an implementation of the heterogeneous documents strategy as a cloud-based email protection system. Cloudsweeper gives users the opportunity to remove or “lock up ” sensitive, unexpected, and rarely used information to mitigate the risks of cloud storage accounts without sacrificing the benefits of cloud storage or computation. We show that Cloudsweeper can efficiently assist users in pinpointing and protecting passwords emailed to them in cleartext. We present performance measurements showing that the system can rewrite past emails stored at cloud providers quickly, along with initial results regarding user preferences for redacted cloud storage